Optical Character Recognition (OCR)

Introduction

OCR works while files are being ingested, converting image files into text documents for analysis in Sintelix, as illustrated below:

Compatible Formats

OCR can convert documents in compatible formats including jpeg, png, gif, tiff, bmp, and scanned pdf.

Requirements

OCR capability requires:

Using the OCR feature:

Once OCR has been enabled in the Ingestion Configuration, it is automatically applied to any image files added to a collection for ingestion.

Once the document has been processed, you can edit/modify the text and export the document just like any other document.

Forms:

When ingesting forms, you can:

  • select the option to mark fields for documents with forms in the OCR Ingestion options, or

  • define PDF form configuration specifying how form fields are to be marked up in the document. The PDF form configuration can then be added to the Ingestion Configuration (see PDF Form Ingestion).